In-class Exercise 7: Geographically Weighted Regression (GWR)

Author

Ho Zi Jun

Published

October 14, 2024

Modified

October 17, 2024

1 Overview: Calibrating Hedonic Pricing Model for Private Highrise Properties with GWR Method

Geographically weighted regression (GWR) is a spatial statistical technique that takes non-stationary variables into consideration (e.g., climate; demographic factors; physical environment characteristics) and models the local relationships between these independent variables and as an outcome of interest (also known as dependent variable). In this hands-on exercise, we will learn how to build hedonic pricing models by using GWR methods. The dependent variable is the resale prices of condominium in 2015. The independent variables are divided into either structural and/or locational.

2 The Data

Two data sets will be used in this model building exercise, they are:

  • URA Master Plan subzone boundary in shapefile format (i.e. MP14_SUBZONE_WEB_PL) AND
  • condo_resale_2015 in csv format (i.e. condo_resale_2015.csv)

3 Getting Started

Before getting started, it is important to install the necessary R packages into R and launch these R packages into the R environment.

The R packages needed for this exercise are as follows:

  • R package for building Ordinary Least Squares regression (OLS) and performing diagnostics tests
  • R package for calibrating geographical weighted family of models
  • R package for multivariate data visualisation and analysis
  • Spatial data handling
    • sf
  • Attribute data handling
    • tidyverse, especially readr, ggplot2 and dplyr
  • Choropleth mapping
    • tmap
  • Presentation-Ready Data Summary and Analytic Result Tables
    • gtsummary
  • Provide utilities for computing indices of model quality and goodness of fit
  • Publication-ready visualizations for model parameters, predictions, and performance diagnostics.

The code chunk below installs and launches these R packages into R environment.

pacman::p_load(olsrr, GWmodel, corrplot, ggpubr, sf, spdep, tidyverse, tmap, gtsummary, broom.helpers,
               ggstatsplot, performance, sfdep, see)

4 Short note about GWmodel

GWmodel package provides a collection of localised spatial statistical methods, namely: GW summary statistics, GW principal components analysis, GW discriminant analysis and various forms of GW regression; some of which are provided in basic and robust (outlier resistant) forms. More commonly, outputs or parameters of the GWmodel are mapped to provide a useful exploratory tool, which can often precede (and direct) a more traditional or sophisticated statistical analysis.

5 Importing the data

5.1 Importing geospatial data

The geospatial data used in this hands-on exercise is called MP14_SUBZONE_WEB_PL. It is in ESRI shapefile format. The shapefile consists of URA Master Plan 2014’s planning subzone boundaries. Polygon features are used to represent these geographic boundaries. The GIS data is in svy21 projected coordinates systems.

The code chunk below is used to import MP_SUBZONE_WEB_PL shapefile by using st_read() of sf packages. The code chunk below also updates the newly imported mpsz sf object with the correct ESPG code (i.e. 3414)

mpsz <- st_read(dsn = "data/geospatial",
                layer = "MP14_SUBZONE_WEB_PL") %>%
  st_transform(3414)
Reading layer `MP14_SUBZONE_WEB_PL' from data source 
  `C:\zjho008\ISSS626-GAA\In-class_Ex\In-class_Ex07\data\geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 323 features and 15 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
Projected CRS: SVY21
Note

The result above shows that the R object used to contain the imported MP14_SUBZONE_WEB_PL shapefile is called mpsz and it is a simple feature object. The geometry type is MULTIPOLYGON. it is also important to note that the mpsz simple feature object does not have EPSG information.

After transforming the object, verification of the projection on the newly transformed mpsz_svy21 is done by using st_crs() of sf package.

The code chunk below is used to verify the newly transformed mpsz_svy21.

st_crs(mpsz)
Coordinate Reference System:
  User input: EPSG:3414 
  wkt:
PROJCRS["SVY21 / Singapore TM",
    BASEGEOGCRS["SVY21",
        DATUM["SVY21",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4757]],
    CONVERSION["Singapore Transverse Mercator",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["northing (N)",north,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["easting (E)",east,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Cadastre, engineering survey, topographic mapping."],
        AREA["Singapore - onshore and offshore."],
        BBOX[1.13,103.59,1.47,104.07]],
    ID["EPSG",3414]]

Notice that the EPSG: is indicated as 3414 now.

ID[“EPSG”,3414]]

Next, the extent of mpsz_svy21 is revealed by using st_bbox() of sf package.

st_bbox(mpsz)
     xmin      ymin      xmax      ymax 
 2667.538 15748.721 56396.440 50256.334 

The extent of mpsz_svy21 is illustrated from the results above.

5.2 URA Master Plan 2014 planning subzone boundary

The condo_resale_2015 is in csv file format. The codes chunk below uses read_csv() function of readr package to import condo_resale_2015 into R as a tibble data frame called condo_resale.

condo_resale <- read_csv("data/aspatial/Condo_resale_2015.csv")
Rows: 1436 Columns: 23
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
dbl (23): LATITUDE, LONGITUDE, POSTCODE, SELLING_PRICE, AREA_SQM, AGE, PROX_...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

After importing the aspatial data file into R, it is important to examine if the data file has been imported correctly.

The codes chunks below uses glimpse() and head() to display the data structure.

glimpse(condo_resale)
Rows: 1,436
Columns: 23
$ LATITUDE             <dbl> 1.287145, 1.328698, 1.313727, 1.308563, 1.321437,…
$ LONGITUDE            <dbl> 103.7802, 103.8123, 103.7971, 103.8247, 103.9505,…
$ POSTCODE             <dbl> 118635, 288420, 267833, 258380, 467169, 466472, 3…
$ SELLING_PRICE        <dbl> 3000000, 3880000, 3325000, 4250000, 1400000, 1320…
$ AREA_SQM             <dbl> 309, 290, 248, 127, 145, 139, 218, 141, 165, 168,…
$ AGE                  <dbl> 30, 32, 33, 7, 28, 22, 24, 24, 27, 31, 17, 22, 6,…
$ PROX_CBD             <dbl> 7.941259, 6.609797, 6.898000, 4.038861, 11.783402…
$ PROX_CHILDCARE       <dbl> 0.16597932, 0.28027246, 0.42922669, 0.39473543, 0…
$ PROX_ELDERLYCARE     <dbl> 2.5198118, 1.9333338, 0.5021395, 1.9910316, 1.121…
$ PROX_URA_GROWTH_AREA <dbl> 6.618741, 7.505109, 6.463887, 4.906512, 6.410632,…
$ PROX_HAWKER_MARKET   <dbl> 1.76542207, 0.54507614, 0.37789301, 1.68259969, 0…
$ PROX_KINDERGARTEN    <dbl> 0.05835552, 0.61592412, 0.14120309, 0.38200076, 0…
$ PROX_MRT             <dbl> 0.5607188, 0.6584461, 0.3053433, 0.6910183, 0.528…
$ PROX_PARK            <dbl> 1.1710446, 0.1992269, 0.2779886, 0.9832843, 0.116…
$ PROX_PRIMARY_SCH     <dbl> 1.6340256, 0.9747834, 1.4715016, 1.4546324, 0.709…
$ PROX_TOP_PRIMARY_SCH <dbl> 3.3273195, 0.9747834, 1.4715016, 2.3006394, 0.709…
$ PROX_SHOPPING_MALL   <dbl> 2.2102717, 2.9374279, 1.2256850, 0.3525671, 1.307…
$ PROX_SUPERMARKET     <dbl> 0.9103958, 0.5900617, 0.4135583, 0.4162219, 0.581…
$ PROX_BUS_STOP        <dbl> 0.10336166, 0.28673408, 0.28504777, 0.29872340, 0…
$ NO_Of_UNITS          <dbl> 18, 20, 27, 30, 30, 31, 32, 32, 32, 32, 34, 34, 3…
$ FAMILY_FRIENDLY      <dbl> 0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0…
$ FREEHOLD             <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1…
$ LEASEHOLD_99YR       <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
head(condo_resale$LONGITUDE) # to see the data in XCOORD column
[1] 103.7802 103.8123 103.7971 103.8247 103.9505 103.9386
head(condo_resale$LATITUDE) # to see the data in YCOORD column
[1] 1.287145 1.328698 1.313727 1.308563 1.321437 1.314198

Following which, summary() of base R is used to display the summary statistics of condo_resale tibble data frame.

summary(condo_resale)
    LATITUDE       LONGITUDE        POSTCODE      SELLING_PRICE     
 Min.   :1.240   Min.   :103.7   Min.   : 18965   Min.   :  540000  
 1st Qu.:1.309   1st Qu.:103.8   1st Qu.:259849   1st Qu.: 1100000  
 Median :1.328   Median :103.8   Median :469298   Median : 1383222  
 Mean   :1.334   Mean   :103.8   Mean   :440439   Mean   : 1751211  
 3rd Qu.:1.357   3rd Qu.:103.9   3rd Qu.:589486   3rd Qu.: 1950000  
 Max.   :1.454   Max.   :104.0   Max.   :828833   Max.   :18000000  
    AREA_SQM          AGE           PROX_CBD       PROX_CHILDCARE    
 Min.   : 34.0   Min.   : 0.00   Min.   : 0.3869   Min.   :0.004927  
 1st Qu.:103.0   1st Qu.: 5.00   1st Qu.: 5.5574   1st Qu.:0.174481  
 Median :121.0   Median :11.00   Median : 9.3567   Median :0.258135  
 Mean   :136.5   Mean   :12.14   Mean   : 9.3254   Mean   :0.326313  
 3rd Qu.:156.0   3rd Qu.:18.00   3rd Qu.:12.6661   3rd Qu.:0.368293  
 Max.   :619.0   Max.   :37.00   Max.   :19.1804   Max.   :3.465726  
 PROX_ELDERLYCARE  PROX_URA_GROWTH_AREA PROX_HAWKER_MARKET PROX_KINDERGARTEN 
 Min.   :0.05451   Min.   :0.2145       Min.   :0.05182    Min.   :0.004927  
 1st Qu.:0.61254   1st Qu.:3.1643       1st Qu.:0.55245    1st Qu.:0.276345  
 Median :0.94179   Median :4.6186       Median :0.90842    Median :0.413385  
 Mean   :1.05351   Mean   :4.5981       Mean   :1.27987    Mean   :0.458903  
 3rd Qu.:1.35122   3rd Qu.:5.7550       3rd Qu.:1.68578    3rd Qu.:0.578474  
 Max.   :3.94916   Max.   :9.1554       Max.   :5.37435    Max.   :2.229045  
    PROX_MRT         PROX_PARK       PROX_PRIMARY_SCH  PROX_TOP_PRIMARY_SCH
 Min.   :0.05278   Min.   :0.02906   Min.   :0.07711   Min.   :0.07711     
 1st Qu.:0.34646   1st Qu.:0.26211   1st Qu.:0.44024   1st Qu.:1.34451     
 Median :0.57430   Median :0.39926   Median :0.63505   Median :1.88213     
 Mean   :0.67316   Mean   :0.49802   Mean   :0.75471   Mean   :2.27347     
 3rd Qu.:0.84844   3rd Qu.:0.65592   3rd Qu.:0.95104   3rd Qu.:2.90954     
 Max.   :3.48037   Max.   :2.16105   Max.   :3.92899   Max.   :6.74819     
 PROX_SHOPPING_MALL PROX_SUPERMARKET PROX_BUS_STOP       NO_Of_UNITS    
 Min.   :0.0000     Min.   :0.0000   Min.   :0.001595   Min.   :  18.0  
 1st Qu.:0.5258     1st Qu.:0.3695   1st Qu.:0.098356   1st Qu.: 188.8  
 Median :0.9357     Median :0.5687   Median :0.151710   Median : 360.0  
 Mean   :1.0455     Mean   :0.6141   Mean   :0.193974   Mean   : 409.2  
 3rd Qu.:1.3994     3rd Qu.:0.7862   3rd Qu.:0.220466   3rd Qu.: 590.0  
 Max.   :3.4774     Max.   :2.2441   Max.   :2.476639   Max.   :1703.0  
 FAMILY_FRIENDLY     FREEHOLD      LEASEHOLD_99YR  
 Min.   :0.0000   Min.   :0.0000   Min.   :0.0000  
 1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000  
 Median :0.0000   Median :0.0000   Median :0.0000  
 Mean   :0.4868   Mean   :0.4227   Mean   :0.4882  
 3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:1.0000  
 Max.   :1.0000   Max.   :1.0000   Max.   :1.0000  

5.3 Converting aspatial data frame into a sf object

The condo_resale tibble data frame is an aspatial data. We will convert it to a sf object. The code chunk below converts condo_resale data frame into a simple feature data frame by using st_as_sf() of sf packages.

condo_resale_sf <- st_as_sf(condo_resale, # to convert condo resale data into simple feature - since it consists of latitude and longitude; note the PRJ format file which gives the Projects Coordinates System
         coords = c("LONGITUDE", "LATITUDE"),
         crs = 4326) %>% # this CRS will be in WGS84 "orignal data source"
  st_transform(crs = 3414) # to project into svy21 - the projected CRS of Singapore whereby the code is 3414


condo_resale_sf # Condo resale sf data frame
Simple feature collection with 1436 features and 21 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 14940.85 ymin: 24765.67 xmax: 43352.45 ymax: 48382.81
Projected CRS: SVY21 / Singapore TM
# A tibble: 1,436 × 22
   POSTCODE SELLING_PRICE AREA_SQM   AGE PROX_CBD PROX_CHILDCARE
 *    <dbl>         <dbl>    <dbl> <dbl>    <dbl>          <dbl>
 1   118635       3000000      309    30     7.94          0.166
 2   288420       3880000      290    32     6.61          0.280
 3   267833       3325000      248    33     6.90          0.429
 4   258380       4250000      127     7     4.04          0.395
 5   467169       1400000      145    28    11.8           0.119
 6   466472       1320000      139    22    10.3           0.125
 7   309502       3410000      218    24     4.24          0.326
 8   468497       1420000      141    24    11.6           0.162
 9   118450       2025000      165    27     6.46          0.123
10   268157       2550000      168    31     6.52          0.609
# ℹ 1,426 more rows
# ℹ 16 more variables: PROX_ELDERLYCARE <dbl>, PROX_URA_GROWTH_AREA <dbl>,
#   PROX_HAWKER_MARKET <dbl>, PROX_KINDERGARTEN <dbl>, PROX_MRT <dbl>,
#   PROX_PARK <dbl>, PROX_PRIMARY_SCH <dbl>, PROX_TOP_PRIMARY_SCH <dbl>,
#   PROX_SHOPPING_MALL <dbl>, PROX_SUPERMARKET <dbl>, PROX_BUS_STOP <dbl>,
#   NO_Of_UNITS <dbl>, FAMILY_FRIENDLY <dbl>, FREEHOLD <dbl>,
#   LEASEHOLD_99YR <dbl>, geometry <POINT [m]>
Note

Notice that st_transform() of sf package is used to convert the coordinates from wgs84 (i.e. crs:4326) to svy21 (i.e. crs=3414).

Next, head() is used to list the contents of condo_resale.sf object.

head(condo_resale_sf)
Simple feature collection with 6 features and 21 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 22085.12 ymin: 29951.54 xmax: 41042.56 ymax: 34546.2
Projected CRS: SVY21 / Singapore TM
# A tibble: 6 × 22
  POSTCODE SELLING_PRICE AREA_SQM   AGE PROX_CBD PROX_CHILDCARE PROX_ELDERLYCARE
     <dbl>         <dbl>    <dbl> <dbl>    <dbl>          <dbl>            <dbl>
1   118635       3000000      309    30     7.94          0.166            2.52 
2   288420       3880000      290    32     6.61          0.280            1.93 
3   267833       3325000      248    33     6.90          0.429            0.502
4   258380       4250000      127     7     4.04          0.395            1.99 
5   467169       1400000      145    28    11.8           0.119            1.12 
6   466472       1320000      139    22    10.3           0.125            0.789
# ℹ 15 more variables: PROX_URA_GROWTH_AREA <dbl>, PROX_HAWKER_MARKET <dbl>,
#   PROX_KINDERGARTEN <dbl>, PROX_MRT <dbl>, PROX_PARK <dbl>,
#   PROX_PRIMARY_SCH <dbl>, PROX_TOP_PRIMARY_SCH <dbl>,
#   PROX_SHOPPING_MALL <dbl>, PROX_SUPERMARKET <dbl>, PROX_BUS_STOP <dbl>,
#   NO_Of_UNITS <dbl>, FAMILY_FRIENDLY <dbl>, FREEHOLD <dbl>,
#   LEASEHOLD_99YR <dbl>, geometry <POINT [m]>
Note

Notice that the output is in a point feature data frame.

Geometry type: POINT

condo_resale_sf <- write_rds(condo_resale_sf,
  "data/rds/condo_resale_sf.rds")
condo_resale_sf <- read_rds(
  "data/rds/condo_resale_sf.rds")

6 Correlation Analysis - ggstatsplot methods

6.0.1 Visualising the relationships of the independent variables

Before building a multiple regression model, it is important to ensure that the independent variables used are not highly correlated to each other. If highly correlated independent variables are used in building a regression model, the quality of the model will be compromised. This phenomenon is known as multicollinearity in statistics.

Correlation matrix is commonly used to visualise the relationships between the independent variables. Besides the pairs() of R, there are many packages supporting the display of a correlation matrix. In this section, the corrplot package will be used.

The code chunk below is used to plot a scatter plot matrix of the relationship between the independent variables in condo_resale data.frame.

corrplot(cor(condo_resale[, 5:23]), diag = FALSE, order = "AOE",
         tl.pos = "td", tl.cex = 0.5, method = "number", type = "upper")

A matrix reorder is very important for mining the hidden structure and patterns in the matrix. There are four methods in corrplot (parameter order), named: “AOE”, “FPC”, “hclust”, “alphabet”.

In the code chunk above, AOE order is used. It orders the variables by using the angular order of the eigenvectors method suggested by Michael Friendly.

From the scatterplot matrix, it is clear that Freehold is highly correlated to LEASE_99YEAR. In view of this, it gives reason to include only either one of them in the subsequent model building.

In this case, LEASE_99YEAR is excluded in the subsequent model building.

In the code chunk below, instead of using corrplot package ggcorrmat() of ggstatsplot is used.

ggcorrmat(condo_resale[, 5:23])

Similarly, it is observed that LEASEHOLD_99YR and FREEHOLD is highly correlated.

7 Building a hedonic pricing model using multiple linear regression method

The code chunk below uses lm() to calibrate the multiple linear regression model.

condo_mlr <- lm(formula = SELLING_PRICE ~ AREA_SQM + AGE    + 
                  PROX_CBD + PROX_CHILDCARE + PROX_ELDERLYCARE +
                  PROX_URA_GROWTH_AREA + PROX_HAWKER_MARKET + PROX_KINDERGARTEN + 
                  PROX_MRT  + PROX_PARK + PROX_PRIMARY_SCH + 
                  PROX_TOP_PRIMARY_SCH + PROX_SHOPPING_MALL + PROX_SUPERMARKET + 
                  PROX_BUS_STOP + NO_Of_UNITS + FAMILY_FRIENDLY + FREEHOLD + LEASEHOLD_99YR,
                  data = condo_resale_sf)
summary(condo_mlr)

Call:
lm(formula = SELLING_PRICE ~ AREA_SQM + AGE + PROX_CBD + PROX_CHILDCARE + 
    PROX_ELDERLYCARE + PROX_URA_GROWTH_AREA + PROX_HAWKER_MARKET + 
    PROX_KINDERGARTEN + PROX_MRT + PROX_PARK + PROX_PRIMARY_SCH + 
    PROX_TOP_PRIMARY_SCH + PROX_SHOPPING_MALL + PROX_SUPERMARKET + 
    PROX_BUS_STOP + NO_Of_UNITS + FAMILY_FRIENDLY + FREEHOLD + 
    LEASEHOLD_99YR, data = condo_resale_sf)

Residuals:
     Min       1Q   Median       3Q      Max 
-3471036  -286903   -22426   239412 12254549 

Coefficients:
                      Estimate Std. Error t value Pr(>|t|)    
(Intercept)           543071.4   136210.9   3.987 7.03e-05 ***
AREA_SQM               12688.7      370.1  34.283  < 2e-16 ***
AGE                   -24566.0     2766.0  -8.881  < 2e-16 ***
PROX_CBD              -78122.0     6791.4 -11.503  < 2e-16 ***
PROX_CHILDCARE       -333219.0   111020.3  -3.001 0.002734 ** 
PROX_ELDERLYCARE      170950.0    42110.8   4.060 5.19e-05 ***
PROX_URA_GROWTH_AREA   38507.6    12523.7   3.075 0.002147 ** 
PROX_HAWKER_MARKET     23801.2    29299.9   0.812 0.416739    
PROX_KINDERGARTEN     144098.0    82738.7   1.742 0.081795 .  
PROX_MRT             -322775.9    58528.1  -5.515 4.14e-08 ***
PROX_PARK             564487.9    66563.0   8.481  < 2e-16 ***
PROX_PRIMARY_SCH      186170.5    65515.2   2.842 0.004553 ** 
PROX_TOP_PRIMARY_SCH    -477.1    20598.0  -0.023 0.981525    
PROX_SHOPPING_MALL   -207721.5    42855.5  -4.847 1.39e-06 ***
PROX_SUPERMARKET      -48074.7    77145.3  -0.623 0.533273    
PROX_BUS_STOP         675755.0   138552.0   4.877 1.20e-06 ***
NO_Of_UNITS             -216.2       90.3  -2.394 0.016797 *  
FAMILY_FRIENDLY       142128.3    47055.1   3.020 0.002569 ** 
FREEHOLD              300646.5    77296.5   3.890 0.000105 ***
LEASEHOLD_99YR        -77137.4    77570.9  -0.994 0.320192    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 755800 on 1416 degrees of freedom
Multiple R-squared:  0.652, Adjusted R-squared:  0.6474 
F-statistic: 139.6 on 19 and 1416 DF,  p-value: < 2.2e-16

8 Model Assessment: olsrr method

In this section, we introduce an excellent R package designed specifically for conducting Ordinary Least Squares (OLS) regression: olsrr. This package offers a comprehensive set of tools to enhance the development of multiple linear regression models. Key features include:

  • Detailed regression output
  • Diagnostic tools for residual analysis
  • Influence measures
  • Tests for heteroskedasticity
  • Model fit evaluation
  • Assessment of variable contributions
  • Procedures for variable selection

These functionalities make olsrr a powerful resource for building and refining regression models in R.

8.1 Generating tidy linear regression report

ols_regress(condo_mlr) # global model
                                Model Summary                                 
-----------------------------------------------------------------------------
R                            0.807       RMSE                     750537.537 
R-Squared                    0.652       MSE                571262902261.223 
Adj. R-Squared               0.647       Coef. Var                    43.160 
Pred R-Squared               0.637       AIC                       42971.173 
MAE                     412117.987       SBC                       43081.835 
-----------------------------------------------------------------------------
 RMSE: Root Mean Square Error 
 MSE: Mean Square Error 
 MAE: Mean Absolute Error 
 AIC: Akaike Information Criteria 
 SBC: Schwarz Bayesian Criteria 

                                     ANOVA                                       
--------------------------------------------------------------------------------
                    Sum of                                                      
                   Squares          DF         Mean Square       F         Sig. 
--------------------------------------------------------------------------------
Regression    1.515738e+15          19        7.977571e+13    139.648    0.0000 
Residual      8.089083e+14        1416    571262902261.223                      
Total         2.324647e+15        1435                                          
--------------------------------------------------------------------------------

                                               Parameter Estimates                                                
-----------------------------------------------------------------------------------------------------------------
               model           Beta    Std. Error    Std. Beta       t        Sig           lower          upper 
-----------------------------------------------------------------------------------------------------------------
         (Intercept)     543071.420    136210.918                   3.987    0.000     275874.535     810268.305 
            AREA_SQM      12688.669       370.119        0.579     34.283    0.000      11962.627      13414.710 
                 AGE     -24566.001      2766.041       -0.166     -8.881    0.000     -29991.980     -19140.022 
            PROX_CBD     -78121.985      6791.377       -0.267    -11.503    0.000     -91444.227     -64799.744 
      PROX_CHILDCARE    -333219.036    111020.303       -0.087     -3.001    0.003    -551000.984    -115437.089 
    PROX_ELDERLYCARE     170949.961     42110.748        0.083      4.060    0.000      88343.803     253556.120 
PROX_URA_GROWTH_AREA      38507.622     12523.661        0.059      3.075    0.002      13940.700      63074.545 
  PROX_HAWKER_MARKET      23801.197     29299.923        0.019      0.812    0.417     -33674.725      81277.120 
   PROX_KINDERGARTEN     144097.972     82738.669        0.030      1.742    0.082     -18205.570     306401.514 
            PROX_MRT    -322775.874     58528.079       -0.123     -5.515    0.000    -437586.937    -207964.811 
           PROX_PARK     564487.876     66563.011        0.148      8.481    0.000     433915.162     695060.590 
    PROX_PRIMARY_SCH     186170.524     65515.193        0.072      2.842    0.005      57653.253     314687.795 
PROX_TOP_PRIMARY_SCH       -477.073     20597.972       -0.001     -0.023    0.982     -40882.894      39928.747 
  PROX_SHOPPING_MALL    -207721.520     42855.500       -0.109     -4.847    0.000    -291788.613    -123654.427 
    PROX_SUPERMARKET     -48074.679     77145.257       -0.012     -0.623    0.533    -199405.956     103256.599 
       PROX_BUS_STOP     675755.044    138551.991        0.133      4.877    0.000     403965.817     947544.272 
         NO_Of_UNITS       -216.180        90.302       -0.046     -2.394    0.017       -393.320        -39.040 
     FAMILY_FRIENDLY     142128.272     47055.082        0.056      3.020    0.003      49823.107     234433.438 
            FREEHOLD     300646.543     77296.529        0.117      3.890    0.000     149018.525     452274.561 
      LEASEHOLD_99YR     -77137.375     77570.869       -0.030     -0.994    0.320    -229303.551      75028.801 
-----------------------------------------------------------------------------------------------------------------

Using the ols_regress() function it generates an improved table for our condo_mlr results. We can reject null hypothesis as the p-value is smaller than our alpha value of 0.05. Based on the Adjusted R-Squared value, this multiple linear regression model is able to explain 64.7% of the price variation.

For PROX_TOP_PRIMARY_SCH & PROX_SUPERMARKET they are not statistically significant with p-values above 0.05. Which indicates that they can be eliminated from building the model later on.

8.2 Multicollinearity

Variance Inflation Factors (VIF) is calculated in this section after the model is calibrated. Steps done: - Refer to ANOVA table to reject null hypothesis - Adjusted r-square Values - Before going to the parameters

ols_vif_tol(condo_mlr)
              Variables Tolerance      VIF
1              AREA_SQM 0.8601326 1.162611
2                   AGE 0.7011585 1.426211
3              PROX_CBD 0.4575471 2.185567
4        PROX_CHILDCARE 0.2898233 3.450378
5      PROX_ELDERLYCARE 0.5922238 1.688551
6  PROX_URA_GROWTH_AREA 0.6614081 1.511926
7    PROX_HAWKER_MARKET 0.4373874 2.286303
8     PROX_KINDERGARTEN 0.8356793 1.196631
9              PROX_MRT 0.4949877 2.020252
10            PROX_PARK 0.8015728 1.247547
11     PROX_PRIMARY_SCH 0.3823248 2.615577
12 PROX_TOP_PRIMARY_SCH 0.4878620 2.049760
13   PROX_SHOPPING_MALL 0.4903052 2.039546
14     PROX_SUPERMARKET 0.6142127 1.628100
15        PROX_BUS_STOP 0.3311024 3.020213
16          NO_Of_UNITS 0.6543336 1.528272
17      FAMILY_FRIENDLY 0.7191719 1.390488
18             FREEHOLD 0.2728521 3.664990
19       LEASEHOLD_99YR 0.2645988 3.779307

Based on the results of the Variance Inflation Factors (VIF) none of the variables are greater than 5. Each of the independent variables are calculated with another independent variable to attain the values above. This shows no need to eliminate the variables.

  • 0 to 5: variables are not correlated
  • 5 to 10: variables are correlated
  • Greater than 10: variables are highly correlated

note that there are binary variables like Y/N options (dummy variables) which have some signs of correlation which are from the variable of lease properties: LEASEHOLD_99YR vs FREEHOLD etc.

8.3 Variable Selection

Stepwise Regression is being used

Forward Stepwise: All independent variables are outside and the variables are loaded in the model - once variable is added in the R Sq and Adjusted R sq is calculated and checking the criteria (E.g. Confidence Levels - values above 0.05 are rejected. The variables have to be below 0.05 and has to improve the R Squared value )

Backward Stepwise: Variables are all loaded inside and they are taken out one by one based on how the adjusted R Square decreases and cafeterias such as the P- Value.

No Replacement once they variables are rejected or added in for an iteration they cannot be placed back in the model

Mixed Stepwise - Using the method of forward stepwise but with replacement.

The functions are already built in with the olsrr package.

condo_fw_mlr <- ols_step_forward_p( # Assessment criteria using p-value
  condo_mlr,
  p_val = 0.05,
  details = TRUE) # With details = true it will show all the iterations and the steps + entire report. details = FALSE will not show the individual split but only showing the 
Forward Selection Method 
------------------------

Candidate Terms: 

1. AREA_SQM 
2. AGE 
3. PROX_CBD 
4. PROX_CHILDCARE 
5. PROX_ELDERLYCARE 
6. PROX_URA_GROWTH_AREA 
7. PROX_HAWKER_MARKET 
8. PROX_KINDERGARTEN 
9. PROX_MRT 
10. PROX_PARK 
11. PROX_PRIMARY_SCH 
12. PROX_TOP_PRIMARY_SCH 
13. PROX_SHOPPING_MALL 
14. PROX_SUPERMARKET 
15. PROX_BUS_STOP 
16. NO_Of_UNITS 
17. FAMILY_FRIENDLY 
18. FREEHOLD 
19. LEASEHOLD_99YR 


Step   => 0 
Model  => SELLING_PRICE ~ 1 
R2     => 0 

Initiating stepwise selection... 

                          Selection Metrics Table                            
----------------------------------------------------------------------------
Predictor               Pr(>|t|)    R-Squared    Adj. R-Squared       AIC    
----------------------------------------------------------------------------
AREA_SQM                 0.00000        0.452             0.451    43587.753 
PROX_CBD                 0.00000        0.243             0.242    44051.772 
FREEHOLD                 0.00000        0.082             0.081    44328.539 
LEASEHOLD_99YR           0.00000        0.066             0.065    44353.172 
PROX_PARK                0.00000        0.049             0.048    44378.817 
NO_Of_UNITS              0.00000        0.048             0.048    44380.124 
PROX_PRIMARY_SCH         0.00000        0.032             0.032    44403.847 
PROX_HAWKER_MARKET       0.00000        0.023             0.022    44417.505 
PROX_CHILDCARE           0.00000        0.021             0.021    44420.298 
PROX_ELDERLYCARE         0.00000        0.021             0.020    44420.546 
PROX_BUS_STOP            0.00000        0.021             0.020    44420.742 
PROX_KINDERGARTEN          2e-05        0.013             0.012    44432.322 
PROX_SUPERMARKET         0.00088        0.008             0.007    44439.977 
PROX_SHOPPING_MALL       0.00154        0.007             0.006    44441.023 
FAMILY_FRIENDLY          0.00907        0.005             0.004    44444.248 
PROX_MRT                 0.01071        0.005             0.004    44444.545 
PROX_URA_GROWTH_AREA     0.13510        0.002             0.001    44448.832 
PROX_TOP_PRIMARY_SCH     0.23180        0.001             0.000    44449.636 
AGE                      0.52978        0.000             0.000    44450.673 
----------------------------------------------------------------------------

Step      => 1 
Selected  => AREA_SQM 
Model     => SELLING_PRICE ~ AREA_SQM 
R2        => 0.452 

                          Selection Metrics Table                            
----------------------------------------------------------------------------
Predictor               Pr(>|t|)    R-Squared    Adj. R-Squared       AIC    
----------------------------------------------------------------------------
PROX_CBD                 0.00000        0.569             0.569    43243.523 
FREEHOLD                 0.00000        0.487             0.487    43493.627 
PROX_PARK                0.00000        0.478             0.478    43518.542 
LEASEHOLD_99YR           0.00000        0.475             0.474    43527.150 
AGE                      0.00000        0.471             0.470    43538.063 
PROX_SHOPPING_MALL       0.00000        0.467             0.466    43549.216 
PROX_HAWKER_MARKET       0.00000        0.465             0.464    43555.065 
PROX_MRT                 0.00000        0.465             0.464    43556.097 
NO_Of_UNITS              0.00000        0.464             0.463    43557.089 
PROX_SUPERMARKET         0.00000        0.461             0.461    43564.792 
PROX_PRIMARY_SCH           3e-05        0.458             0.458    43572.418 
PROX_ELDERLYCARE           5e-05        0.458             0.457    43573.203 
PROX_URA_GROWTH_AREA       9e-05        0.458             0.457    43574.292 
FAMILY_FRIENDLY          0.00026        0.457             0.456    43576.392 
PROX_CHILDCARE           0.00275        0.455             0.455    43580.768 
PROX_BUS_STOP            0.00381        0.455             0.454    43581.362 
PROX_KINDERGARTEN        0.15757        0.453             0.452    43587.751 
PROX_TOP_PRIMARY_SCH     0.47485        0.452             0.451    43589.241 
----------------------------------------------------------------------------

Step      => 2 
Selected  => PROX_CBD 
Model     => SELLING_PRICE ~ AREA_SQM + PROX_CBD 
R2        => 0.569 

                          Selection Metrics Table                            
----------------------------------------------------------------------------
Predictor               Pr(>|t|)    R-Squared    Adj. R-Squared       AIC    
----------------------------------------------------------------------------
PROX_PARK                0.00000        0.589             0.588    43177.691 
AGE                      0.00000        0.586             0.585    43188.935 
FREEHOLD                 0.00000        0.579             0.578    43213.005 
PROX_ELDERLYCARE         0.00000        0.578             0.577    43216.850 
PROX_TOP_PRIMARY_SCH     0.00000        0.577             0.576    43218.861 
LEASEHOLD_99YR           0.00000        0.576             0.575    43224.500 
PROX_HAWKER_MARKET         1e-05        0.575             0.574    43225.123 
PROX_SHOPPING_MALL         8e-05        0.574             0.573    43229.948 
PROX_SUPERMARKET         0.00147        0.572             0.571    43235.376 
PROX_MRT                 0.00613        0.572             0.571    43237.989 
NO_Of_UNITS              0.01059        0.571             0.570    43238.970 
PROX_PRIMARY_SCH         0.04530        0.570             0.570    43241.503 
PROX_BUS_STOP            0.06634        0.570             0.569    43242.142 
FAMILY_FRIENDLY          0.11212        0.570             0.569    43242.991 
PROX_CHILDCARE           0.29768        0.570             0.569    43244.435 
PROX_URA_GROWTH_AREA     0.78658        0.569             0.568    43245.450 
PROX_KINDERGARTEN        0.80879        0.569             0.568    43245.465 
----------------------------------------------------------------------------

Step      => 3 
Selected  => PROX_PARK 
Model     => SELLING_PRICE ~ AREA_SQM + PROX_CBD + PROX_PARK 
R2        => 0.589 

                          Selection Metrics Table                            
----------------------------------------------------------------------------
Predictor               Pr(>|t|)    R-Squared    Adj. R-Squared       AIC    
----------------------------------------------------------------------------
FREEHOLD                 0.00000        0.604             0.603    43125.474 
AGE                      0.00000        0.602             0.601    43132.534 
LEASEHOLD_99YR           0.00000        0.601             0.600    43138.902 
PROX_ELDERLYCARE         0.00000        0.596             0.595    43153.932 
PROX_TOP_PRIMARY_SCH       3e-05        0.594             0.593    43162.363 
NO_Of_UNITS              0.00013        0.593             0.592    43164.977 
PROX_SHOPPING_MALL       0.00015        0.593             0.592    43165.286 
PROX_HAWKER_MARKET         7e-04        0.592             0.591    43168.151 
PROX_MRT                 0.00250        0.592             0.591    43170.516 
FAMILY_FRIENDLY          0.02445        0.591             0.589    43174.609 
PROX_SUPERMARKET         0.02905        0.591             0.589    43174.908 
PROX_URA_GROWTH_AREA     0.14518        0.590             0.589    43177.560 
PROX_CHILDCARE           0.31093        0.589             0.588    43178.660 
PROX_PRIMARY_SCH         0.34515        0.589             0.588    43178.796 
PROX_BUS_STOP            0.47898        0.589             0.588    43179.188 
PROX_KINDERGARTEN        0.87351        0.589             0.588    43179.665 
----------------------------------------------------------------------------

Step      => 4 
Selected  => FREEHOLD 
Model     => SELLING_PRICE ~ AREA_SQM + PROX_CBD + PROX_PARK + FREEHOLD 
R2        => 0.604 

                          Selection Metrics Table                            
----------------------------------------------------------------------------
Predictor               Pr(>|t|)    R-Squared    Adj. R-Squared       AIC    
----------------------------------------------------------------------------
AGE                      0.00000        0.620             0.619    43069.222 
PROX_SHOPPING_MALL       0.00000        0.611             0.609    43104.195 
PROX_ELDERLYCARE           5e-05        0.609             0.608    43111.036 
PROX_TOP_PRIMARY_SCH       7e-05        0.609             0.607    43111.551 
PROX_HAWKER_MARKET       0.00088        0.607             0.606    43116.360 
PROX_SUPERMARKET         0.00324        0.607             0.605    43118.765 
PROX_MRT                 0.00345        0.607             0.605    43118.882 
PROX_BUS_STOP            0.09204        0.605             0.604    43124.623 
FAMILY_FRIENDLY          0.11599        0.605             0.604    43124.992 
PROX_PRIMARY_SCH         0.21752        0.605             0.603    43125.946 
NO_Of_UNITS              0.25242        0.605             0.603    43126.158 
PROX_URA_GROWTH_AREA     0.27640        0.605             0.603    43126.284 
LEASEHOLD_99YR           0.49846        0.605             0.603    43127.014 
PROX_KINDERGARTEN        0.66364        0.604             0.603    43127.284 
PROX_CHILDCARE           0.82289        0.604             0.603    43127.424 
----------------------------------------------------------------------------

Step      => 5 
Selected  => AGE 
Model     => SELLING_PRICE ~ AREA_SQM + PROX_CBD + PROX_PARK + FREEHOLD + AGE 
R2        => 0.62 

                          Selection Metrics Table                            
----------------------------------------------------------------------------
Predictor               Pr(>|t|)    R-Squared    Adj. R-Squared       AIC    
----------------------------------------------------------------------------
PROX_ELDERLYCARE         0.00000        0.627             0.625    43046.515 
PROX_SHOPPING_MALL       0.00014        0.624             0.622    43056.710 
PROX_TOP_PRIMARY_SCH     0.00036        0.623             0.622    43058.400 
PROX_MRT                 0.00118        0.623             0.621    43060.651 
PROX_HAWKER_MARKET       0.00229        0.623             0.621    43061.874 
NO_Of_UNITS              0.03614        0.621             0.620    43066.808 
PROX_SUPERMARKET         0.03902        0.621             0.620    43066.940 
PROX_PRIMARY_SCH         0.04454        0.621             0.620    43067.165 
PROX_URA_GROWTH_AREA     0.05538        0.621             0.619    43067.532 
FAMILY_FRIENDLY          0.06368        0.621             0.619    43067.765 
PROX_BUS_STOP            0.09258        0.621             0.619    43068.378 
LEASEHOLD_99YR           0.33191        0.620             0.619    43070.276 
PROX_KINDERGARTEN        0.54422        0.620             0.619    43070.852 
PROX_CHILDCARE           0.76117        0.620             0.619    43071.129 
----------------------------------------------------------------------------

Step      => 6 
Selected  => PROX_ELDERLYCARE 
Model     => SELLING_PRICE ~ AREA_SQM + PROX_CBD + PROX_PARK + FREEHOLD + AGE + PROX_ELDERLYCARE 
R2        => 0.627 

                          Selection Metrics Table                            
----------------------------------------------------------------------------
Predictor               Pr(>|t|)    R-Squared    Adj. R-Squared       AIC    
----------------------------------------------------------------------------
PROX_SHOPPING_MALL       0.00000        0.634             0.632    43020.990 
PROX_MRT                 0.00000        0.633             0.631    43024.733 
PROX_SUPERMARKET         0.00311        0.629             0.627    43039.719 
PROX_CHILDCARE           0.00320        0.629             0.627    43039.776 
PROX_TOP_PRIMARY_SCH     0.02859        0.628             0.626    43043.694 
FAMILY_FRIENDLY          0.04001        0.628             0.626    43044.273 
PROX_URA_GROWTH_AREA     0.06111        0.628             0.626    43044.987 
PROX_HAWKER_MARKET       0.14370        0.627             0.625    43046.364 
NO_Of_UNITS              0.21750        0.627             0.625    43046.985 
LEASEHOLD_99YR           0.33225        0.627             0.625    43047.569 
PROX_PRIMARY_SCH         0.72554        0.627             0.625    43048.391 
PROX_BUS_STOP            0.73834        0.627             0.625    43048.403 
PROX_KINDERGARTEN        0.96832        0.627             0.625    43048.513 
----------------------------------------------------------------------------

Step      => 7 
Selected  => PROX_SHOPPING_MALL 
Model     => SELLING_PRICE ~ AREA_SQM + PROX_CBD + PROX_PARK + FREEHOLD + AGE + PROX_ELDERLYCARE + PROX_SHOPPING_MALL 
R2        => 0.634 

                          Selection Metrics Table                            
----------------------------------------------------------------------------
Predictor               Pr(>|t|)    R-Squared    Adj. R-Squared       AIC    
----------------------------------------------------------------------------
PROX_URA_GROWTH_AREA       2e-04        0.637             0.635    43009.092 
PROX_MRT                 0.00038        0.637             0.635    43010.278 
FAMILY_FRIENDLY          0.09004        0.634             0.632    43020.098 
NO_Of_UNITS              0.09561        0.634             0.632    43020.195 
PROX_BUS_STOP            0.10105        0.634             0.632    43020.284 
PROX_CHILDCARE           0.16782        0.634             0.632    43021.075 
PROX_PRIMARY_SCH         0.20169        0.634             0.632    43021.349 
PROX_HAWKER_MARKET       0.28053        0.634             0.632    43021.818 
PROX_SUPERMARKET         0.39017        0.634             0.632    43022.247 
LEASEHOLD_99YR           0.41342        0.634             0.632    43022.317 
PROX_KINDERGARTEN        0.64794        0.634             0.632    43022.781 
PROX_TOP_PRIMARY_SCH     0.88928        0.634             0.632    43022.971 
----------------------------------------------------------------------------

Step      => 8 
Selected  => PROX_URA_GROWTH_AREA 
Model     => SELLING_PRICE ~ AREA_SQM + PROX_CBD + PROX_PARK + FREEHOLD + AGE + PROX_ELDERLYCARE + PROX_SHOPPING_MALL + PROX_URA_GROWTH_AREA 
R2        => 0.637 

                          Selection Metrics Table                            
----------------------------------------------------------------------------
Predictor               Pr(>|t|)    R-Squared    Adj. R-Squared       AIC    
----------------------------------------------------------------------------
PROX_MRT                 0.00055        0.640             0.638    42999.058 
NO_Of_UNITS              0.04357        0.638             0.636    43006.989 
PROX_BUS_STOP            0.07301        0.638             0.636    43007.854 
FAMILY_FRIENDLY          0.07751        0.638             0.636    43007.953 
PROX_CHILDCARE           0.17683        0.638             0.635    43009.255 
LEASEHOLD_99YR           0.26341        0.638             0.635    43009.832 
PROX_SUPERMARKET         0.32522        0.637             0.635    43010.117 
PROX_TOP_PRIMARY_SCH     0.36995        0.637             0.635    43010.282 
PROX_HAWKER_MARKET       0.48716        0.637             0.635    43010.606 
PROX_KINDERGARTEN        0.49501        0.637             0.635    43010.623 
PROX_PRIMARY_SCH         0.60814        0.637             0.635    43010.827 
----------------------------------------------------------------------------

Step      => 9 
Selected  => PROX_MRT 
Model     => SELLING_PRICE ~ AREA_SQM + PROX_CBD + PROX_PARK + FREEHOLD + AGE + PROX_ELDERLYCARE + PROX_SHOPPING_MALL + PROX_URA_GROWTH_AREA + PROX_MRT 
R2        => 0.64 

                          Selection Metrics Table                            
----------------------------------------------------------------------------
Predictor               Pr(>|t|)    R-Squared    Adj. R-Squared       AIC    
----------------------------------------------------------------------------
PROX_BUS_STOP              6e-05        0.644             0.642    42984.951 
PROX_PRIMARY_SCH         0.01738        0.642             0.639    42995.355 
NO_Of_UNITS              0.04105        0.641             0.639    42996.851 
FAMILY_FRIENDLY          0.06468        0.641             0.639    42997.618 
PROX_TOP_PRIMARY_SCH     0.16342        0.641             0.638    42999.100 
LEASEHOLD_99YR           0.16895        0.641             0.638    42999.151 
PROX_KINDERGARTEN        0.19107        0.641             0.638    42999.335 
PROX_HAWKER_MARKET       0.19288        0.641             0.638    42999.349 
PROX_SUPERMARKET         0.45603        0.640             0.638    43000.498 
PROX_CHILDCARE           0.71809        0.640             0.638    43000.927 
----------------------------------------------------------------------------

Step      => 10 
Selected  => PROX_BUS_STOP 
Model     => SELLING_PRICE ~ AREA_SQM + PROX_CBD + PROX_PARK + FREEHOLD + AGE + PROX_ELDERLYCARE + PROX_SHOPPING_MALL + PROX_URA_GROWTH_AREA + PROX_MRT + PROX_BUS_STOP 
R2        => 0.644 

                          Selection Metrics Table                            
----------------------------------------------------------------------------
Predictor               Pr(>|t|)    R-Squared    Adj. R-Squared       AIC    
----------------------------------------------------------------------------
FAMILY_FRIENDLY          0.01590        0.646             0.643    42981.085 
PROX_CHILDCARE           0.02032        0.646             0.643    42981.519 
NO_Of_UNITS              0.03658        0.645             0.643    42982.543 
PROX_PRIMARY_SCH         0.06688        0.645             0.642    42983.563 
PROX_KINDERGARTEN        0.09160        0.645             0.642    42984.080 
LEASEHOLD_99YR           0.10015        0.645             0.642    42984.224 
PROX_TOP_PRIMARY_SCH     0.27924        0.645             0.642    42985.770 
PROX_HAWKER_MARKET       0.53937        0.644             0.642    42986.571 
PROX_SUPERMARKET         0.91393        0.644             0.641    42986.939 
----------------------------------------------------------------------------

Step      => 11 
Selected  => FAMILY_FRIENDLY 
Model     => SELLING_PRICE ~ AREA_SQM + PROX_CBD + PROX_PARK + FREEHOLD + AGE + PROX_ELDERLYCARE + PROX_SHOPPING_MALL + PROX_URA_GROWTH_AREA + PROX_MRT + PROX_BUS_STOP + FAMILY_FRIENDLY 
R2        => 0.646 

                          Selection Metrics Table                            
----------------------------------------------------------------------------
Predictor               Pr(>|t|)    R-Squared    Adj. R-Squared       AIC    
----------------------------------------------------------------------------
NO_Of_UNITS              0.00533        0.648             0.645    42975.246 
PROX_CHILDCARE           0.01908        0.647             0.644    42977.539 
PROX_PRIMARY_SCH         0.06018        0.647             0.644    42979.519 
LEASEHOLD_99YR           0.06704        0.647             0.644    42979.699 
PROX_KINDERGARTEN        0.09772        0.646             0.643    42980.317 
PROX_TOP_PRIMARY_SCH     0.31070        0.646             0.643    42982.048 
PROX_HAWKER_MARKET       0.66885        0.646             0.643    42982.901 
PROX_SUPERMARKET         0.92593        0.646             0.643    42983.077 
----------------------------------------------------------------------------

Step      => 12 
Selected  => NO_Of_UNITS 
Model     => SELLING_PRICE ~ AREA_SQM + PROX_CBD + PROX_PARK + FREEHOLD + AGE + PROX_ELDERLYCARE + PROX_SHOPPING_MALL + PROX_URA_GROWTH_AREA + PROX_MRT + PROX_BUS_STOP + FAMILY_FRIENDLY + NO_Of_UNITS 
R2        => 0.648 

                          Selection Metrics Table                            
----------------------------------------------------------------------------
Predictor               Pr(>|t|)    R-Squared    Adj. R-Squared       AIC    
----------------------------------------------------------------------------
PROX_CHILDCARE           0.02092        0.649             0.646    42971.858 
PROX_PRIMARY_SCH         0.05496        0.649             0.645    42973.525 
PROX_KINDERGARTEN        0.13311        0.648             0.645    42974.967 
LEASEHOLD_99YR           0.16053        0.648             0.645    42975.257 
PROX_TOP_PRIMARY_SCH     0.28337        0.648             0.645    42976.084 
PROX_HAWKER_MARKET       0.62348        0.648             0.644    42977.003 
PROX_SUPERMARKET         0.65604        0.648             0.644    42977.046 
----------------------------------------------------------------------------

Step      => 13 
Selected  => PROX_CHILDCARE 
Model     => SELLING_PRICE ~ AREA_SQM + PROX_CBD + PROX_PARK + FREEHOLD + AGE + PROX_ELDERLYCARE + PROX_SHOPPING_MALL + PROX_URA_GROWTH_AREA + PROX_MRT + PROX_BUS_STOP + FAMILY_FRIENDLY + NO_Of_UNITS + PROX_CHILDCARE 
R2        => 0.649 

                          Selection Metrics Table                            
----------------------------------------------------------------------------
Predictor               Pr(>|t|)    R-Squared    Adj. R-Squared       AIC    
----------------------------------------------------------------------------
PROX_PRIMARY_SCH         0.00805        0.651             0.647    42966.758 
PROX_KINDERGARTEN        0.08599        0.650             0.646    42970.878 
PROX_TOP_PRIMARY_SCH     0.23060        0.649             0.646    42972.405 
LEASEHOLD_99YR           0.32104        0.649             0.646    42972.863 
PROX_HAWKER_MARKET       0.49652        0.649             0.646    42973.391 
PROX_SUPERMARKET         0.59607        0.649             0.646    42973.574 
----------------------------------------------------------------------------

Step      => 14 
Selected  => PROX_PRIMARY_SCH 
Model     => SELLING_PRICE ~ AREA_SQM + PROX_CBD + PROX_PARK + FREEHOLD + AGE + PROX_ELDERLYCARE + PROX_SHOPPING_MALL + PROX_URA_GROWTH_AREA + PROX_MRT + PROX_BUS_STOP + FAMILY_FRIENDLY + NO_Of_UNITS + PROX_CHILDCARE + PROX_PRIMARY_SCH 
R2        => 0.651 

                          Selection Metrics Table                            
----------------------------------------------------------------------------
Predictor               Pr(>|t|)    R-Squared    Adj. R-Squared       AIC    
----------------------------------------------------------------------------
PROX_KINDERGARTEN        0.07528        0.651             0.648    42965.558 
LEASEHOLD_99YR           0.24093        0.651             0.647    42967.367 
PROX_HAWKER_MARKET       0.29790        0.651             0.647    42967.662 
PROX_TOP_PRIMARY_SCH     0.38435        0.651             0.647    42967.993 
PROX_SUPERMARKET         0.76578        0.651             0.647    42968.669 
----------------------------------------------------------------------------


No more variables to be added.

Variables Selected: 

=> AREA_SQM 
=> PROX_CBD 
=> PROX_PARK 
=> FREEHOLD 
=> AGE 
=> PROX_ELDERLYCARE 
=> PROX_SHOPPING_MALL 
=> PROX_URA_GROWTH_AREA 
=> PROX_MRT 
=> PROX_BUS_STOP 
=> FAMILY_FRIENDLY 
=> NO_Of_UNITS 
=> PROX_CHILDCARE 
=> PROX_PRIMARY_SCH 

Using the p-value the statistically significant factors are kept.

Under the list created - there is a list of 3 included metrics, model, others in the condo_fw_mlr list

plot(condo_fw_mlr)

8.4 Visualising model parameters

ggcoefstats(condo_mlr,
            sort = "ascending")
Number of labels is greater than default palette color count.
• Select another color `palette` (and/or `package`).

8.5 Test for Non-Linearity

In multiple linear regression, it is important for us to test the assumption that linearity and additivity of the relationship between dependent and independent variables.

In the code chunk below, the ols_plot_resid_fit() of olsrr package is used to perform linearity assumption test.

ols_plot_resid_fit(condo_fw_mlr$model)

The figure above reveals that most of the data points are scattered around the 0 line, hence we can safely conclude that the relationships between the dependent and independent variables are linear.

8.6 Tests for Normality Assumption

In the code chunk below, ols_plot_resid_hist() of olsrr package is used to perform normality assumption test.

ols_plot_resid_hist(condo_fw_mlr$model)

The figure above reveals that the residual of the multiple linear regression model (i.e. condo.mlr1) resembles a normal distribution.

For formal statistical test methods, the ols_test_normality() of olsrr package can be used as shown in the code chunk below.

ols_test_normality(condo_fw_mlr$model)
Warning in ks.test.default(y, "pnorm", mean(y), sd(y)): ties should not be
present for the one-sample Kolmogorov-Smirnov test
-----------------------------------------------
       Test             Statistic       pvalue  
-----------------------------------------------
Shapiro-Wilk              0.6856         0.0000 
Kolmogorov-Smirnov        0.1366         0.0000 
Cramer-von Mises         121.0768        0.0000 
Anderson-Darling         67.9551         0.0000 
-----------------------------------------------

The summary table reveals that the p-values of the four tests are way smaller than the alpha value of 0.05. Hence we will reject the null hypothesis and infer that there is statistical evidence that the residuals are not normally distributed.

8.7 Testing for spatial autocorrelation

The hedonic model to be built will utilise geographically referenced attributes, hence it is also important for us to visualise the residual of the hedonic pricing model.

First, we will export the residual of the hedonic pricing model and save it as a data frame.

mlr_output <- as.data.frame(condo_fw_mlr$model$residuals) %>%
  rename(`FW_MLR_RES` = `condo_fw_mlr$model$residuals`) # renamed to shorten the field name

Next, we will join the newly created data frame with condo_resale_sf object.

condo_resale_sf <- cbind(condo_resale_sf, # cbind to combine the newly created table condo_resale_sf - is a point data hence using cbind function to  append since there is no common identifier
                         mlr_output$FW_MLR_RES) %>%
  rename(`MLR_RES` = `mlr_output.FW_MLR_RES`)

Next, we will use tmap package to display the distribution of the residuals on an interactive map.

The code chunk below turns on the interactive mode of tmap.

tmap_mode("view")
tmap mode set to interactive viewing
tm_shape(mpsz) +
  tmap_options(check.and.fix = TRUE) + # line is used to resolve the issue: polygon issue and geometric error - line written here since the `mpsz` layer is giving the issues. Otherwise it can be done at the start to eliminate all problems.
  tm_polygons(alpha = 0.4) + # error due to a HDB flat polygon left in the dataset
tm_shape(condo_resale_sf) +
  tm_dots(col = "MLR_RES",
          alpha = 0.6,
          style = "quantile")
Warning: The shape mpsz is invalid (after reprojection). See sf::st_is_valid
Variable(s) "MLR_RES" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.
tmap_mode("plot") # used to switch the mode back to plot
tmap mode set to plotting
Note

The plot above reveals that there is signs of spatial autocorrelation.

8.8 Spatial stationary test

To validate our observation, we will conduct the Moran’s I test.

  • Null hypothesis (Ho): The residuals are randomly distributed (i.e., spatially stationary).
  • Alternative hypothesis (H1): The residuals are not randomly distributed and are spatially non-stationary.

As a first step, we will create a distance-based weight matrix using the dnearneigh() function from the spdep package.

Note

actual price vs estimated transacted price is the residual. Darker green shade represents that - estimated price is higher than the actual transacted price.

On the other hand, the lighter colour represents actual transactions that are much lower than the estimated price

Moran’s I test will be performed with the code chunk below.

The latest version of GW model also facilitates the use of sfdep

condo_resale_sf <- condo_resale_sf %>%
  mutate(nb = st_knn(geometry, k = 6, # k nearest neighbour
                     longlat = FALSE), # so that it will not use the grid circle since all the data is already projected - not a longitude,latitude and just use the data as it is. 
         wt = st_weights(nb,
                         style = "W"),
         .before = 1)

Next, global_moran_perm() of sfdep is used to perform global Moran permutation test.

global_moran_perm(condo_resale_sf$MLR_RES, # data from condo_resale_sf and MLR_RES is the column that will be used
                  condo_resale_sf$nb,
                  condo_resale_sf$wt,
                  alternative = "two.sided",
                  nsim = 99) # 100 permutations

    Monte-Carlo simulation of Moran I

data:  x 
weights: listw  
number of simulations + 1: 100 

statistic = 0.32254, observed rank = 100, p-value < 2.2e-16
alternative hypothesis: two.sided

The Global Moran’s test I for residual spatial autocorrelation shows that it’s p-value is less than 0.00000000000000022 which is less than the alpha value of 0.05. Hence, we will reject the null hypothesis that the residuals are randomly distributed.

Since the Observed Global Moran I = 0.25586 (statistic = 0.32254) which is greater than 0, we can infer that the residuals resemble cluster distribution.

9 Building Hedonic Pricing Models using GWmodel

This section will illustrate how to model hedonic pricing by using a geographically weighted regression model. Two spatial weights are used: - fixed bandwidth scheme - adaptive bandwidth scheme

9.1 Building Fixed bandwidth GWR Model

In the code chunk below bw.gwr() of GWModel package is used to determine the optimal fixed bandwidth to use in the model. Notice that the argument adaptive is set to FALSE indicating that we are interested to compute the fixed bandwidth.

There are two possible approaches can be used to determine the stopping rule, they are: CV cross-validation approach and AIC corrected (AICc) approach. We define the stopping rule using the approach agreement.

bw.fixed <- bw.gwr(formula = SELLING_PRICE ~ AREA_SQM + AGE + PROX_CBD + 
                     PROX_CHILDCARE + PROX_ELDERLYCARE  + PROX_URA_GROWTH_AREA + 
                     PROX_MRT   + PROX_PARK + PROX_PRIMARY_SCH + 
                     PROX_SHOPPING_MALL + PROX_BUS_STOP + NO_Of_UNITS + 
                     FAMILY_FRIENDLY + FREEHOLD,
                   data = condo_resale_sf, 
                   approach = "CV", # CV
                   kernel = "gaussian", # has to be used in later steps for consistency
                   adaptive = FALSE, 
                   longlat = FALSE) # so that greater distance is not calculated
Fixed bandwidth: 17660.96 CV score: 8.259118e+14 
Fixed bandwidth: 10917.26 CV score: 7.970454e+14 
Fixed bandwidth: 6749.419 CV score: 7.273273e+14 
Fixed bandwidth: 4173.553 CV score: 6.300006e+14 
Fixed bandwidth: 2581.58 CV score: 5.404958e+14 
Fixed bandwidth: 1597.687 CV score: 4.857515e+14 
Fixed bandwidth: 989.6077 CV score: 4.722431e+14 
Fixed bandwidth: 613.7939 CV score: 1.378294e+16 
Fixed bandwidth: 1221.873 CV score: 4.778717e+14 
Fixed bandwidth: 846.0596 CV score: 4.791629e+14 
Fixed bandwidth: 1078.325 CV score: 4.751406e+14 
Fixed bandwidth: 934.7772 CV score: 4.72518e+14 
Fixed bandwidth: 1023.495 CV score: 4.730305e+14 
Fixed bandwidth: 968.6643 CV score: 4.721317e+14 
Fixed bandwidth: 955.7206 CV score: 4.722072e+14 
Fixed bandwidth: 976.6639 CV score: 4.721387e+14 
Fixed bandwidth: 963.7202 CV score: 4.721484e+14 
Fixed bandwidth: 971.7199 CV score: 4.721293e+14 
Fixed bandwidth: 973.6083 CV score: 4.721309e+14 
Fixed bandwidth: 970.5527 CV score: 4.721295e+14 
Fixed bandwidth: 972.4412 CV score: 4.721296e+14 
Fixed bandwidth: 971.2741 CV score: 4.721292e+14 
Fixed bandwidth: 970.9985 CV score: 4.721293e+14 
Fixed bandwidth: 971.4443 CV score: 4.721292e+14 
Fixed bandwidth: 971.5496 CV score: 4.721293e+14 
Fixed bandwidth: 971.3793 CV score: 4.721292e+14 
Fixed bandwidth: 971.3391 CV score: 4.721292e+14 
Fixed bandwidth: 971.3143 CV score: 4.721292e+14 
Fixed bandwidth: 971.3545 CV score: 4.721292e+14 
Fixed bandwidth: 971.3296 CV score: 4.721292e+14 
Fixed bandwidth: 971.345 CV score: 4.721292e+14 
Fixed bandwidth: 971.3355 CV score: 4.721292e+14 
Fixed bandwidth: 971.3413 CV score: 4.721292e+14 
Fixed bandwidth: 971.3377 CV score: 4.721292e+14 
Fixed bandwidth: 971.34 CV score: 4.721292e+14 
Fixed bandwidth: 971.3405 CV score: 4.721292e+14 
Fixed bandwidth: 971.3408 CV score: 4.721292e+14 
Fixed bandwidth: 971.3403 CV score: 4.721292e+14 
Fixed bandwidth: 971.3406 CV score: 4.721292e+14 
Fixed bandwidth: 971.3404 CV score: 4.721292e+14 
Fixed bandwidth: 971.3405 CV score: 4.721292e+14 
Fixed bandwidth: 971.3405 CV score: 4.721292e+14 

The bandwidth distances are becoming shorter (in metres)

Some of the results are as shown:

  • Fixed bandwidth: 613.7939 CV score: 1.378294e+16
  • Fixed bandwidth: 1221.873 CV score: 4.778717e+14

The bandwidth increases at time which is due to the iterations ran

For the values below:

  • Fixed bandwidth: 971.3405 CV score: 4.721292e+14
  • Fixed bandwidth: 971.3408 CV score: 4.721292e+14
  • Fixed bandwidth: 971.3403 CV score: 4.721292e+14
  • Fixed bandwidth: 971.3406 CV score: 4.721292e+14
  • Fixed bandwidth: 971.3404 CV score: 4.721292e+14
  • Fixed bandwidth: 971.3405 CV score: 4.721292e+14
  • Fixed bandwidth: 971.3405 CV score: 4.721292e+14

The distances are refined while looking for the best CV score. once the rate of change is to minimal then it will stop running the iterations.

9.1.1 GWModel Method - Fixed Bandwidth

Now to utilise the code chunk below to calibrate the GWR Model using fixed bandwidth and the Gaussian Kernel.

gwr_fixed <- gwr.basic(formula = SELLING_PRICE ~ AREA_SQM + AGE + PROX_CBD + 
                     PROX_CHILDCARE + PROX_ELDERLYCARE  + PROX_URA_GROWTH_AREA + 
                     PROX_MRT   + PROX_PARK + PROX_PRIMARY_SCH + 
                     PROX_SHOPPING_MALL + PROX_BUS_STOP + NO_Of_UNITS + 
                     FAMILY_FRIENDLY + FREEHOLD,
                   data = condo_resale_sf, 
                   bw = bw.fixed,
                   kernel = "gaussian", # has to be used in later steps for consistency
                   longlat = FALSE) # so that greater distance is not calculated

The output is saved in a list of class “gwrm”. The code below can be used to display the model output.

The variables are not changed but the spatial components are accounted for in the calculation for this GWR Model.

gwr_fixed
   ***********************************************************************
   *                       Package   GWmodel                             *
   ***********************************************************************
   Program starts at: 2024-10-17 00:08:29.422253 
   Call:
   gwr.basic(formula = SELLING_PRICE ~ AREA_SQM + AGE + PROX_CBD + 
    PROX_CHILDCARE + PROX_ELDERLYCARE + PROX_URA_GROWTH_AREA + 
    PROX_MRT + PROX_PARK + PROX_PRIMARY_SCH + PROX_SHOPPING_MALL + 
    PROX_BUS_STOP + NO_Of_UNITS + FAMILY_FRIENDLY + FREEHOLD, 
    data = condo_resale_sf, bw = bw.fixed, kernel = "gaussian", 
    longlat = FALSE)

   Dependent (y) variable:  SELLING_PRICE
   Independent variables:  AREA_SQM AGE PROX_CBD PROX_CHILDCARE PROX_ELDERLYCARE PROX_URA_GROWTH_AREA PROX_MRT PROX_PARK PROX_PRIMARY_SCH PROX_SHOPPING_MALL PROX_BUS_STOP NO_Of_UNITS FAMILY_FRIENDLY FREEHOLD
   Number of data points: 1436
   ***********************************************************************
   *                    Results of Global Regression                     *
   ***********************************************************************

   Call:
    lm(formula = formula, data = data)

   Residuals:
     Min       1Q   Median       3Q      Max 
-3470778  -298119   -23481   248917 12234210 

   Coefficients:
                          Estimate Std. Error t value Pr(>|t|)    
   (Intercept)           527633.22  108183.22   4.877 1.20e-06 ***
   AREA_SQM               12777.52     367.48  34.771  < 2e-16 ***
   AGE                   -24687.74    2754.84  -8.962  < 2e-16 ***
   PROX_CBD              -77131.32    5763.12 -13.384  < 2e-16 ***
   PROX_CHILDCARE       -318472.75  107959.51  -2.950 0.003231 ** 
   PROX_ELDERLYCARE      185575.62   39901.86   4.651 3.61e-06 ***
   PROX_URA_GROWTH_AREA   39163.25   11754.83   3.332 0.000885 ***
   PROX_MRT             -294745.11   56916.37  -5.179 2.56e-07 ***
   PROX_PARK             570504.81   65507.03   8.709  < 2e-16 ***
   PROX_PRIMARY_SCH      159856.14   60234.60   2.654 0.008046 ** 
   PROX_SHOPPING_MALL   -220947.25   36561.83  -6.043 1.93e-09 ***
   PROX_BUS_STOP         682482.22  134513.24   5.074 4.42e-07 ***
   NO_Of_UNITS             -245.48      87.95  -2.791 0.005321 ** 
   FAMILY_FRIENDLY       146307.58   46893.02   3.120 0.001845 ** 
   FREEHOLD              350599.81   48506.48   7.228 7.98e-13 ***

   ---Significance stars
   Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
   Residual standard error: 756000 on 1421 degrees of freedom
   Multiple R-squared: 0.6507
   Adjusted R-squared: 0.6472 
   F-statistic: 189.1 on 14 and 1421 DF,  p-value: < 2.2e-16 
   ***Extra Diagnostic information
   Residual sum of squares: 8.120609e+14
   Sigma(hat): 752522.9
   AIC:  42966.76
   AICc:  42967.14
   BIC:  41731.39
   ***********************************************************************
   *          Results of Geographically Weighted Regression              *
   ***********************************************************************

   *********************Model calibration information*********************
   Kernel function: gaussian 
   Fixed bandwidth: 971.3405 
   Regression points: the same locations as observations are used.
   Distance metric: Euclidean distance metric is used.

   ****************Summary of GWR coefficient estimates:******************
                               Min.     1st Qu.      Median     3rd Qu.
   Intercept            -3.5988e+07 -5.1998e+05  7.6780e+05  1.7412e+06
   AREA_SQM              1.0003e+03  5.2758e+03  7.4740e+03  1.2301e+04
   AGE                  -1.3475e+05 -2.0813e+04 -8.6260e+03 -3.7784e+03
   PROX_CBD             -7.7047e+07 -2.3608e+05 -8.3600e+04  3.4646e+04
   PROX_CHILDCARE       -6.0097e+06 -3.3667e+05 -9.7425e+04  2.9007e+05
   PROX_ELDERLYCARE     -3.5000e+06 -1.5970e+05  3.1971e+04  1.9577e+05
   PROX_URA_GROWTH_AREA -3.0170e+06 -8.2013e+04  7.0749e+04  2.2612e+05
   PROX_MRT             -3.5282e+06 -6.5836e+05 -1.8833e+05  3.6922e+04
   PROX_PARK            -1.2062e+06 -2.1732e+05  3.5383e+04  4.1335e+05
   PROX_PRIMARY_SCH     -2.2695e+07 -1.7066e+05  4.8472e+04  5.1555e+05
   PROX_SHOPPING_MALL   -7.2585e+06 -1.6684e+05 -1.0517e+04  1.5923e+05
   PROX_BUS_STOP        -1.4676e+06 -4.5207e+04  3.7601e+05  1.1664e+06
   NO_Of_UNITS          -1.3170e+03 -2.4822e+02 -3.0846e+01  2.5496e+02
   FAMILY_FRIENDLY      -2.2749e+06 -1.1140e+05  7.6214e+03  1.6107e+05
   FREEHOLD             -9.2067e+06  3.8073e+04  1.5169e+05  3.7528e+05
                             Max.
   Intercept            112793548
   AREA_SQM                 21575
   AGE                     434201
   PROX_CBD               2704596
   PROX_CHILDCARE         1654087
   PROX_ELDERLYCARE      38867814
   PROX_URA_GROWTH_AREA  78515730
   PROX_MRT               3124316
   PROX_PARK             18122425
   PROX_PRIMARY_SCH       4637503
   PROX_SHOPPING_MALL     1529952
   PROX_BUS_STOP         11342182
   NO_Of_UNITS              12907
   FAMILY_FRIENDLY        1720744
   FREEHOLD               6073636
   ************************Diagnostic information*************************
   Number of data points: 1436 
   Effective number of parameters (2trace(S) - trace(S'S)): 438.3804 
   Effective degrees of freedom (n-2trace(S) + trace(S'S)): 997.6196 
   AICc (GWR book, Fotheringham, et al. 2002, p. 61, eq 2.33): 42263.61 
   AIC (GWR book, Fotheringham, et al. 2002,GWR p. 96, eq. 4.22): 41632.36 
   BIC (GWR book, Fotheringham, et al. 2002,GWR p. 61, eq. 2.34): 42515.71 
   Residual sum of squares: 2.53407e+14 
   R-square value:  0.8909912 
   Adjusted R-square value:  0.8430417 

   ***********************************************************************
   Program stops at: 2024-10-17 00:08:30.52213 

The report shows that the AICc of the gwr is 42263.61 under the Diagnostic Information section which is significantly smaller than the global multiple linear regression model of 42967.1.

9.2 Building Adaptive Bandwidth GWR Model

GWR based hedonic pricing model will be calibrated by using adaptive bandwidth approach.

Similar to the earlier section, we will first use bw.gwr() to determine the recommended data points for usage.

The code chunk used will look very similar to the one used to compute the fixed bandwidth except the adaptive argument has changed to TRUE.

bw.adaptive <- bw.gwr(formula = SELLING_PRICE ~ AREA_SQM + AGE  + 
                        PROX_CBD + PROX_CHILDCARE + PROX_ELDERLYCARE    + 
                        PROX_URA_GROWTH_AREA + PROX_MRT + PROX_PARK + 
                        PROX_PRIMARY_SCH + PROX_SHOPPING_MALL   + PROX_BUS_STOP + 
                        NO_Of_UNITS + FAMILY_FRIENDLY + FREEHOLD, 
                      data = condo_resale_sf, 
                      approach = "CV", 
                      kernel = "gaussian", 
                      adaptive = TRUE, 
                      longlat = FALSE)
Adaptive bandwidth: 895 CV score: 7.952401e+14 
Adaptive bandwidth: 561 CV score: 7.667364e+14 
Adaptive bandwidth: 354 CV score: 6.953454e+14 
Adaptive bandwidth: 226 CV score: 6.15223e+14 
Adaptive bandwidth: 147 CV score: 5.674373e+14 
Adaptive bandwidth: 98 CV score: 5.426745e+14 
Adaptive bandwidth: 68 CV score: 5.168117e+14 
Adaptive bandwidth: 49 CV score: 4.859631e+14 
Adaptive bandwidth: 37 CV score: 4.646518e+14 
Adaptive bandwidth: 30 CV score: 4.422088e+14 
Adaptive bandwidth: 25 CV score: 4.430816e+14 
Adaptive bandwidth: 32 CV score: 4.505602e+14 
Adaptive bandwidth: 27 CV score: 4.462172e+14 
Adaptive bandwidth: 30 CV score: 4.422088e+14 

30 nearest neighbour is the recommended bandwidth - meaning to use 30 data points to calculate the regression model

Now to calibrate the gwr-based hedonic pricing model by using adaptive bandwidth and gaussian kernel as shown in the code chunk below.

gwr_adaptive <- gwr.basic(formula = SELLING_PRICE ~ AREA_SQM + AGE + 
                            PROX_CBD + PROX_CHILDCARE + PROX_ELDERLYCARE + 
                            PROX_URA_GROWTH_AREA + PROX_MRT + PROX_PARK + 
                            PROX_PRIMARY_SCH + PROX_SHOPPING_MALL + PROX_BUS_STOP + 
                            NO_Of_UNITS + FAMILY_FRIENDLY + FREEHOLD, 
                          data = condo_resale_sf, bw = bw.adaptive, 
                          kernel = 'gaussian', 
                          adaptive=TRUE, 
                          longlat = FALSE)

The code below can be used to display the model output.

gwr_adaptive
   ***********************************************************************
   *                       Package   GWmodel                             *
   ***********************************************************************
   Program starts at: 2024-10-17 00:08:38.782278 
   Call:
   gwr.basic(formula = SELLING_PRICE ~ AREA_SQM + AGE + PROX_CBD + 
    PROX_CHILDCARE + PROX_ELDERLYCARE + PROX_URA_GROWTH_AREA + 
    PROX_MRT + PROX_PARK + PROX_PRIMARY_SCH + PROX_SHOPPING_MALL + 
    PROX_BUS_STOP + NO_Of_UNITS + FAMILY_FRIENDLY + FREEHOLD, 
    data = condo_resale_sf, bw = bw.adaptive, kernel = "gaussian", 
    adaptive = TRUE, longlat = FALSE)

   Dependent (y) variable:  SELLING_PRICE
   Independent variables:  AREA_SQM AGE PROX_CBD PROX_CHILDCARE PROX_ELDERLYCARE PROX_URA_GROWTH_AREA PROX_MRT PROX_PARK PROX_PRIMARY_SCH PROX_SHOPPING_MALL PROX_BUS_STOP NO_Of_UNITS FAMILY_FRIENDLY FREEHOLD
   Number of data points: 1436
   ***********************************************************************
   *                    Results of Global Regression                     *
   ***********************************************************************

   Call:
    lm(formula = formula, data = data)

   Residuals:
     Min       1Q   Median       3Q      Max 
-3470778  -298119   -23481   248917 12234210 

   Coefficients:
                          Estimate Std. Error t value Pr(>|t|)    
   (Intercept)           527633.22  108183.22   4.877 1.20e-06 ***
   AREA_SQM               12777.52     367.48  34.771  < 2e-16 ***
   AGE                   -24687.74    2754.84  -8.962  < 2e-16 ***
   PROX_CBD              -77131.32    5763.12 -13.384  < 2e-16 ***
   PROX_CHILDCARE       -318472.75  107959.51  -2.950 0.003231 ** 
   PROX_ELDERLYCARE      185575.62   39901.86   4.651 3.61e-06 ***
   PROX_URA_GROWTH_AREA   39163.25   11754.83   3.332 0.000885 ***
   PROX_MRT             -294745.11   56916.37  -5.179 2.56e-07 ***
   PROX_PARK             570504.81   65507.03   8.709  < 2e-16 ***
   PROX_PRIMARY_SCH      159856.14   60234.60   2.654 0.008046 ** 
   PROX_SHOPPING_MALL   -220947.25   36561.83  -6.043 1.93e-09 ***
   PROX_BUS_STOP         682482.22  134513.24   5.074 4.42e-07 ***
   NO_Of_UNITS             -245.48      87.95  -2.791 0.005321 ** 
   FAMILY_FRIENDLY       146307.58   46893.02   3.120 0.001845 ** 
   FREEHOLD              350599.81   48506.48   7.228 7.98e-13 ***

   ---Significance stars
   Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
   Residual standard error: 756000 on 1421 degrees of freedom
   Multiple R-squared: 0.6507
   Adjusted R-squared: 0.6472 
   F-statistic: 189.1 on 14 and 1421 DF,  p-value: < 2.2e-16 
   ***Extra Diagnostic information
   Residual sum of squares: 8.120609e+14
   Sigma(hat): 752522.9
   AIC:  42966.76
   AICc:  42967.14
   BIC:  41731.39
   ***********************************************************************
   *          Results of Geographically Weighted Regression              *
   ***********************************************************************

   *********************Model calibration information*********************
   Kernel function: gaussian 
   Adaptive bandwidth: 30 (number of nearest neighbours)
   Regression points: the same locations as observations are used.
   Distance metric: Euclidean distance metric is used.

   ****************Summary of GWR coefficient estimates:******************
                               Min.     1st Qu.      Median     3rd Qu.
   Intercept            -1.3487e+08 -2.4669e+05  7.7928e+05  1.6194e+06
   AREA_SQM              3.3188e+03  5.6285e+03  7.7825e+03  1.2738e+04
   AGE                  -9.6746e+04 -2.9288e+04 -1.4043e+04 -5.6119e+03
   PROX_CBD             -2.5330e+06 -1.6256e+05 -7.7242e+04  2.6624e+03
   PROX_CHILDCARE       -1.2790e+06 -2.0175e+05  8.7158e+03  3.7778e+05
   PROX_ELDERLYCARE     -1.6212e+06 -9.2050e+04  6.1029e+04  2.8184e+05
   PROX_URA_GROWTH_AREA -7.2686e+06 -3.0350e+04  4.5869e+04  2.4613e+05
   PROX_MRT             -4.3781e+07 -6.7282e+05 -2.2115e+05 -7.4593e+04
   PROX_PARK            -2.9020e+06 -1.6782e+05  1.1601e+05  4.6572e+05
   PROX_PRIMARY_SCH     -8.6418e+05 -1.6627e+05 -7.7853e+03  4.3222e+05
   PROX_SHOPPING_MALL   -1.8272e+06 -1.3175e+05 -1.4049e+04  1.3799e+05
   PROX_BUS_STOP        -2.0579e+06 -7.1461e+04  4.1104e+05  1.2071e+06
   NO_Of_UNITS          -2.1993e+03 -2.3685e+02 -3.4699e+01  1.1657e+02
   FAMILY_FRIENDLY      -5.9879e+05 -5.0927e+04  2.6173e+04  2.2481e+05
   FREEHOLD             -1.6340e+05  4.0765e+04  1.9023e+05  3.7960e+05
                            Max.
   Intercept            18758355
   AREA_SQM                23064
   AGE                     13303
   PROX_CBD             11346650
   PROX_CHILDCARE        2892127
   PROX_ELDERLYCARE      2465671
   PROX_URA_GROWTH_AREA  7384059
   PROX_MRT              1186242
   PROX_PARK             2588497
   PROX_PRIMARY_SCH      3381462
   PROX_SHOPPING_MALL   38038564
   PROX_BUS_STOP        12081592
   NO_Of_UNITS              1010
   FAMILY_FRIENDLY       2072414
   FREEHOLD              1813995
   ************************Diagnostic information*************************
   Number of data points: 1436 
   Effective number of parameters (2trace(S) - trace(S'S)): 350.3088 
   Effective degrees of freedom (n-2trace(S) + trace(S'S)): 1085.691 
   AICc (GWR book, Fotheringham, et al. 2002, p. 61, eq 2.33): 41982.22 
   AIC (GWR book, Fotheringham, et al. 2002,GWR p. 96, eq. 4.22): 41546.74 
   BIC (GWR book, Fotheringham, et al. 2002,GWR p. 61, eq. 2.34): 41914.08 
   Residual sum of squares: 2.528227e+14 
   R-square value:  0.8912425 
   Adjusted R-square value:  0.8561185 

   ***********************************************************************
   Program stops at: 2024-10-17 00:08:40.196155 

The report shows that the AICc of the adaptive distance gwr is 41982.22 (AICc (GWR book, Fotheringham, et al. 2002, p. 61, eq 2.33): 41982.22) which is even smaller than the AICc of the fixed distance gwr of 42263.61.

9.2.1 Visualisign GWR Output

In addition to regression residuals, the output feature class table includes fields for observed and predicted y values, condition number (cond), Local R2, residuals, and explanatory variable coefficients and standard errors:

  • Condition Number: This diagnostic assesses local collinearity in the model. When local collinearity is high, the results may become unstable. A condition number greater than 30 suggests that the results may be unreliable.

  • Local R²: This metric ranges from 0.0 to 1.0 and indicates how well the local regression model fits the observed y values. Low values suggest poor model performance in certain areas. Mapping Local R² can highlight where the Geographically Weighted Regression (GWR) performs well or poorly, offering insights into potentially missing variables.

  • Predicted Values: These are the estimated y values generated by the GWR model, representing the fitted values.

  • Residuals: Residuals are calculated by subtracting the predicted y values from the observed y values. Standardized residuals, which have a mean of zero and a standard deviation of 1, can be visualized on a cold-to-hot color scale, indicating areas of under- or over-prediction.

  • Coefficient Standard Error: This measures the reliability of each coefficient estimate. Smaller standard errors relative to the coefficient values suggest greater confidence in the estimates, while large standard errors may indicate issues with local collinearity.

They are all stored in a SpatialPointsDataFrame or SpatialPolygonsDataFrame object integrated with fit.points, GWR coefficient estimates, y value, predicted values, coefficient standard errors and t-values in its “data” slot in an object called SDF of the output list.

9.2.2 Converting SDF into sf data.frame

To visualise the fields in SDF, we need to first convert it into sf data.frame by using the code chunk below:

gwr_adaptive_output <- as.data.frame(
  gwr_adaptive$SDF) %>%
  select(-c(2:12)) # exclude column 2 & 15
gwr_sf_adaptive <- cbind(condo_resale_sf,
                         gwr_adaptive_output)

Next, glimpse() is used to display the content of condo_resale_sf.adpative sf data frame.

glimpse(gwr_sf_adaptive)
Rows: 1,436
Columns: 66
$ nb                      <nb> <66, 77, 123, 238, 239, 343>, <21, 162, 163, 19…
$ wt                      <list> <0.1666667, 0.1666667, 0.1666667, 0.1666667, …
$ POSTCODE                <dbl> 118635, 288420, 267833, 258380, 467169, 466472…
$ SELLING_PRICE           <dbl> 3000000, 3880000, 3325000, 4250000, 1400000, 1…
$ AREA_SQM                <dbl> 309, 290, 248, 127, 145, 139, 218, 141, 165, 1…
$ AGE                     <dbl> 30, 32, 33, 7, 28, 22, 24, 24, 27, 31, 17, 22,…
$ PROX_CBD                <dbl> 7.941259, 6.609797, 6.898000, 4.038861, 11.783…
$ PROX_CHILDCARE          <dbl> 0.16597932, 0.28027246, 0.42922669, 0.39473543…
$ PROX_ELDERLYCARE        <dbl> 2.5198118, 1.9333338, 0.5021395, 1.9910316, 1.…
$ PROX_URA_GROWTH_AREA    <dbl> 6.618741, 7.505109, 6.463887, 4.906512, 6.4106…
$ PROX_HAWKER_MARKET      <dbl> 1.76542207, 0.54507614, 0.37789301, 1.68259969…
$ PROX_KINDERGARTEN       <dbl> 0.05835552, 0.61592412, 0.14120309, 0.38200076…
$ PROX_MRT                <dbl> 0.5607188, 0.6584461, 0.3053433, 0.6910183, 0.…
$ PROX_PARK               <dbl> 1.1710446, 0.1992269, 0.2779886, 0.9832843, 0.…
$ PROX_PRIMARY_SCH        <dbl> 1.6340256, 0.9747834, 1.4715016, 1.4546324, 0.…
$ PROX_TOP_PRIMARY_SCH    <dbl> 3.3273195, 0.9747834, 1.4715016, 2.3006394, 0.…
$ PROX_SHOPPING_MALL      <dbl> 2.2102717, 2.9374279, 1.2256850, 0.3525671, 1.…
$ PROX_SUPERMARKET        <dbl> 0.9103958, 0.5900617, 0.4135583, 0.4162219, 0.…
$ PROX_BUS_STOP           <dbl> 0.10336166, 0.28673408, 0.28504777, 0.29872340…
$ NO_Of_UNITS             <dbl> 18, 20, 27, 30, 30, 31, 32, 32, 32, 32, 34, 34…
$ FAMILY_FRIENDLY         <dbl> 0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0…
$ FREEHOLD                <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1…
$ LEASEHOLD_99YR          <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ MLR_RES                 <dbl> -1489099.55, 415494.57, 194129.69, 1088992.71,…
$ Intercept               <dbl> 2050011.67, 1633128.24, 3433608.17, 234358.91,…
$ NO_Of_UNITS.1           <dbl> 104.8290640, -288.3441183, -9.5532945, -161.35…
$ FAMILY_FRIENDLY.1       <dbl> -9075.370, 310074.664, 5949.746, 1556178.531, …
$ FREEHOLD.1              <dbl> 303955.61, 396221.27, 168821.75, 1212515.58, 3…
$ y                       <dbl> 3000000, 3880000, 3325000, 4250000, 1400000, 1…
$ yhat                    <dbl> 2886531.8, 3466801.5, 3616527.2, 5435481.6, 13…
$ residual                <dbl> 113468.16, 413198.52, -291527.20, -1185481.63,…
$ CV_Score                <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ Stud_residual           <dbl> 0.38207013, 1.01433140, -0.83780678, -2.846146…
$ Intercept_SE            <dbl> 516105.5, 488083.5, 963711.4, 444185.5, 211962…
$ AREA_SQM_SE             <dbl> 823.2860, 825.2380, 988.2240, 617.4007, 1376.2…
$ AGE_SE                  <dbl> 5889.782, 6226.916, 6510.236, 6010.511, 8180.3…
$ PROX_CBD_SE             <dbl> 37411.22, 23615.06, 56103.77, 469337.41, 41064…
$ PROX_CHILDCARE_SE       <dbl> 319111.1, 299705.3, 349128.5, 304965.2, 698720…
$ PROX_ELDERLYCARE_SE     <dbl> 120633.34, 84546.69, 129687.07, 127150.69, 327…
$ PROX_URA_GROWTH_AREA_SE <dbl> 56207.39, 76956.50, 95774.60, 470762.12, 47433…
$ PROX_MRT_SE             <dbl> 185181.3, 281133.9, 275483.7, 279877.1, 363830…
$ PROX_PARK_SE            <dbl> 205499.6, 229358.7, 314124.3, 227249.4, 364580…
$ PROX_PRIMARY_SCH_SE     <dbl> 152400.7, 165150.7, 196662.6, 240878.9, 249087…
$ PROX_SHOPPING_MALL_SE   <dbl> 109268.8, 98906.8, 119913.3, 177104.1, 301032.…
$ PROX_BUS_STOP_SE        <dbl> 600668.6, 410222.1, 464156.7, 562810.8, 740922…
$ NO_Of_UNITS_SE          <dbl> 218.1258, 208.9410, 210.9828, 361.7767, 299.50…
$ FAMILY_FRIENDLY_SE      <dbl> 131474.73, 114989.07, 146607.22, 108726.62, 16…
$ FREEHOLD_SE             <dbl> 115954.0, 130110.0, 141031.5, 138239.1, 210641…
$ Intercept_TV            <dbl> 3.9720784, 3.3460017, 3.5629010, 0.5276150, 1.…
$ AREA_SQM_TV             <dbl> 11.614302, 20.087361, 13.247868, 33.577223, 4.…
$ AGE_TV                  <dbl> -1.6154474, -9.3441881, -4.1023685, -15.524301…
$ PROX_CBD_TV             <dbl> -3.22582173, -6.32792021, -4.62353528, 5.17080…
$ PROX_CHILDCARE_TV       <dbl> 1.000488185, 1.471786337, -0.344047555, 1.5766…
$ PROX_ELDERLYCARE_TV     <dbl> -3.26126929, 3.84626245, 4.13191383, 2.4756745…
$ PROX_URA_GROWTH_AREA_TV <dbl> -2.846248368, -1.848971738, -2.648105057, -5.6…
$ PROX_MRT_TV             <dbl> -1.61864578, -8.92998600, -3.40075727, -7.2870…
$ PROX_PARK_TV            <dbl> -0.83749312, 2.28192684, 0.66565951, -3.340617…
$ PROX_PRIMARY_SCH_TV     <dbl> 1.59230221, 6.70194543, 2.90580089, 12.9836104…
$ PROX_SHOPPING_MALL_TV   <dbl> 2.753588422, -0.886626400, -1.056869486, -0.16…
$ PROX_BUS_STOP_TV        <dbl> 2.0154464, 4.4941192, 3.0419145, 12.8383775, 0…
$ NO_Of_UNITS_TV          <dbl> 0.480589953, -1.380026395, -0.045279967, -0.44…
$ FAMILY_FRIENDLY_TV      <dbl> -0.06902748, 2.69655779, 0.04058290, 14.312764…
$ FREEHOLD_TV             <dbl> 2.6213469, 3.0452799, 1.1970499, 8.7711485, 1.…
$ Local_R2                <dbl> 0.8846744, 0.8899773, 0.8947007, 0.9073605, 0.…
$ geometry                <POINT [m]> POINT (22085.12 29951.54), POINT (25656.…
$ geometry.1              <POINT [m]> POINT (22085.12 29951.54), POINT (25656.…

Summary() function is used in the code chunk below.

summary(gwr_adaptive$SDF$yhat)
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
  171347  1102001  1385528  1751842  1982307 13887901 

9.2.3 Visualising local R2

The code chunk below is used to create an interactive point symbol map.

tmap_mode("view")
tmap mode set to interactive viewing
tmap_options(check.and.fix = TRUE)
tm_shape(mpsz)+
  tm_polygons(alpha = 0.1) +
tm_shape(gwr_sf_adaptive) +  
  tm_dots(col = "Local_R2",
          border.col = "gray60",
          border.lwd = 1) +
  tm_view(set.zoom.limits = c(11,14))
Warning: The shape mpsz is invalid (after reprojection). See sf::st_is_valid

Switching the mode back to plot

tmap_mode("plot")
tmap mode set to plotting

9.2.4 Visualising Coefficient Estimates

The code chunk below is used to create an interactive point symbol map from the coefficient estimates

tmap_options(check.and.fix = TRUE)
tmap_mode("view")
tmap mode set to interactive viewing
AREA_SQM_SE <- tm_shape(mpsz)+
  tm_polygons(alpha = 0.1) +
tm_shape(gwr_sf_adaptive) +  
  tm_dots(col = "AREA_SQM_SE",
          border.col = "gray60",
          border.lwd = 1) +
  tm_view(set.zoom.limits = c(11,14))

AREA_SQM_TV <- tm_shape(mpsz)+
  tm_polygons(alpha = 0.1) +
tm_shape(gwr_sf_adaptive) +  
  tm_dots(col = "AREA_SQM_TV",
          border.col = "gray60",
          border.lwd = 1) +
  tm_view(set.zoom.limits = c(11,14))

tmap_arrange(AREA_SQM_SE, AREA_SQM_TV, 
             asp=1, ncol=2,
             sync = TRUE)
Warning: The shape mpsz is invalid (after reprojection). See sf::st_is_valid
Warning: The shape mpsz is invalid (after reprojection). See sf::st_is_valid

Switching the mode back to plot

tmap_mode("plot")
tmap mode set to plotting

9.2.5 Visualising by URA Plannign Region

tm_shape(mpsz[mpsz$REGION_N=="CENTRAL REGION", ])+
  tm_polygons()+
tm_shape(gwr_sf_adaptive) + 
  tm_bubbles(col = "Local_R2",
           size = 0.15,
           border.col = "gray60",
           border.lwd = 1)
Warning: The shape mpsz[mpsz$REGION_N == "CENTRAL REGION", ] is invalid. See
sf::st_is_valid

10 Conclusion

For this in class exercise, it primarily uses the sfdep package instead of the spdep package as done in the hands-on exercise.

END